Picture for Jiahao Ying

Jiahao Ying

OpenSkillEval: Automatically Auditing the Open Skill Ecosystem for LLM Agents

Add code
May 28, 2026
Viaarxiv icon

Disentangling Language and Culture for Evaluating Multilingual Large Language Models

Add code
May 30, 2025
Viaarxiv icon

FRAbench and GenEval: Scaling Fine-Grained Aspect Evaluation across Tasks, Modalities

Add code
May 19, 2025
Viaarxiv icon

Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks

Add code
Apr 26, 2025
Figure 1 for Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Figure 2 for Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Figure 3 for Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Figure 4 for Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Viaarxiv icon

Revisiting LLM Evaluation through Mechanism Interpretability: a New Metric and Model Utility Law

Add code
Apr 10, 2025
Figure 1 for Revisiting LLM Evaluation through Mechanism Interpretability: a New Metric and Model Utility Law
Figure 2 for Revisiting LLM Evaluation through Mechanism Interpretability: a New Metric and Model Utility Law
Figure 3 for Revisiting LLM Evaluation through Mechanism Interpretability: a New Metric and Model Utility Law
Figure 4 for Revisiting LLM Evaluation through Mechanism Interpretability: a New Metric and Model Utility Law
Viaarxiv icon

SeaExam and SeaBench: Benchmarking LLMs with Local Multilingual Questions in Southeast Asia

Add code
Feb 10, 2025
Figure 1 for SeaExam and SeaBench: Benchmarking LLMs with Local Multilingual Questions in Southeast Asia
Figure 2 for SeaExam and SeaBench: Benchmarking LLMs with Local Multilingual Questions in Southeast Asia
Figure 3 for SeaExam and SeaBench: Benchmarking LLMs with Local Multilingual Questions in Southeast Asia
Figure 4 for SeaExam and SeaBench: Benchmarking LLMs with Local Multilingual Questions in Southeast Asia
Viaarxiv icon

EvoWiki: Evaluating LLMs on Evolving Knowledge

Add code
Dec 18, 2024
Figure 1 for EvoWiki: Evaluating LLMs on Evolving Knowledge
Figure 2 for EvoWiki: Evaluating LLMs on Evolving Knowledge
Figure 3 for EvoWiki: Evaluating LLMs on Evolving Knowledge
Figure 4 for EvoWiki: Evaluating LLMs on Evolving Knowledge
Viaarxiv icon

Diagnosing and Remedying Knowledge Deficiencies in LLMs via Label-free Curricular Meaningful Learning

Add code
Aug 21, 2024
Figure 1 for Diagnosing and Remedying Knowledge Deficiencies in LLMs via Label-free Curricular Meaningful Learning
Figure 2 for Diagnosing and Remedying Knowledge Deficiencies in LLMs via Label-free Curricular Meaningful Learning
Figure 3 for Diagnosing and Remedying Knowledge Deficiencies in LLMs via Label-free Curricular Meaningful Learning
Figure 4 for Diagnosing and Remedying Knowledge Deficiencies in LLMs via Label-free Curricular Meaningful Learning
Viaarxiv icon

LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement

Add code
Jun 29, 2024
Figure 1 for LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement
Figure 2 for LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement
Figure 3 for LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement
Figure 4 for LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement
Viaarxiv icon

QRMeM: Unleash the Length Limitation through Question then Reflection Memory Mechanism

Add code
Jun 19, 2024
Figure 1 for QRMeM: Unleash the Length Limitation through Question then Reflection Memory Mechanism
Figure 2 for QRMeM: Unleash the Length Limitation through Question then Reflection Memory Mechanism
Figure 3 for QRMeM: Unleash the Length Limitation through Question then Reflection Memory Mechanism
Figure 4 for QRMeM: Unleash the Length Limitation through Question then Reflection Memory Mechanism
Viaarxiv icon